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Abstract’  This  paper  discusses  preliminary  tests  on  using  predicted  forecast  errors  to  estimate  the  impact  of  observations  in  correcting 
the  Nava!  Research  Laboratory  (NRL)  tide  resolving^  high  resolution  regional  version  of  the  Navy  Coastal  Ocean  Model  (RNCOM) 
assimilating  local  observations  processed  through  the  NRL  Coupled  Ocean  Data  Assimilation  (NCODA)  system.  Since  there  will  always  be  a 
shortfall  of  data  to  constraint  all  sources  of  uncertainty  there  is  an  obvious  advantage  to  optimally  guide  observations  to  reduce  model  errors 
that  could  be  producing  the  most  negative  impacts.  The  importance  of  this  topic  has  been  further  heightened  in  oceanic  applications  by  the 
advent  of  Underwater  Automated  Vehicles  (UAVs)  that  can  bring  persistent  observations  but  need  to  be  told  where  to  go  and  when, 
following  regular  schedules.  This  works  tests  a  technique  named  the  Ensemble  Transform  Kalman  Filter  (ETKF)  that  can  be  used  to 
automate  such  adaptive  sampling  guidance  and  has  been  successfully  applied  for  atmospheric  modeling  optimization.  The  ETKF  uses  an 
ensemble  of  state-fields  from  a  certain  initialization  time  and  rapid  low  rank  solutions  of  the  Kalman  filter  equations  to  estimate  integrated 
predicted  error  reduction  for  selected  target  ensemble  variables,  or  combinations  of  variables,  over  areas  and  forecast  ranges  of  interest.  The 
error  estimates  are  produced  through  independent  RNCOM  runs  using  perturbed  forcing  and  initial  conditions  constrained  at  each  analysis 
time  by  new  estimates  of  the  analysis  errors  as  provided  by  NCODA,  using  a  technique  named  Ensemble  Transform  (ET).  The  skills  of  these 
systems  in  tracking  the  RNCOM  forecast  errors  and  predicting  the  reduction  in  forecast  error  from  a  set  o  possible  observations  were  tested 
using  local  profile  measurements  off  the  East  Philippines.  Results  show  areas  of  larger  uncertainty  close  to  the  major  spatial  gradients  as 
one  could  anticipate  and  a  good  accuracy  of  error  estimates  with  an  high  spread-skill  (Le.  ensemble  estimates  had  the  ability  to  correctly 
separate  the  small  ensemble  spread  well  correlated  with  the  smaller  observed  errors  from  the  larger  ensemble  spread  well  correlated  with  the 
larger  observed  errors).  This  consistency  is  a  necessary  condition  to  allow  the  ETKF  to  accurately  predict  the  impact  of  the  observations  in 
reducing  model  errors.  These  ETKF  skills  were  then  tested  by  comparing  the  vertically  averaged  predicted  temperature  corrections  based  on 
the  local  measurements  with  the  vertically  averaged  magnitude  of  the  observed  changes  between  two  consecutive  forecasts  (before  and  after 
assimilating  the  data).  Results  showed  the  system  had  skills  to  accurately  predict  RNCOM  errors  and  the  impact  of  observation  networks  in 
reducing  the  error  of  model  state-variables. 

I.  THE  HIGH  RESOLUTION  NAVY  COASTAL  OCEAN  MODEL 

The  relocatable  Navy  Coastal  Ocean  Model  (RNCOM)  is  based  on  a  standardized  development  and  an  efficient  configuration 
management  to  facilitate  transitions  of  new  tools  and  real-time  configurations  of  regional  high  resolution  (order  1  km)  ocean 
predictions  [1].  The  physics  and  numerical  procedures  ofNCOM  are  based  on  the  Princeton  Ocean  Model  (POM)  and  a  Sigma/Z- 
level  Model  (SZM).  It  solves  a  three-dimensional,  primitive  equation,  baroclinic,  hydrostatic  and  free  surface  system  using  a 
cartesian  horizontal  grid,  a  combination  of  a/z  level  (i.e.  bottom-following/constant  depth)  vertical  grid  and  implicit  treatment  of 
the  free  surface.  Horizontal  eddy  coefficients  are  calculated  based  on  maximum  grid-cell  Reynolds  number  criteria,  and  vertical 
eddy  coefficients  are  calculated  using  the  Mellor-Yamada  Level  2  turbulence  closure  scheme.  For  meso-scale  real-time 
applications,  boundary  conditions  are  taken  from  an  operational  run  of  the  global  NCOM  (GNCOM).  The  global  model 
assimilates  satellite  altimetry  and  sea-surface  temperature  (SST)  data  using  a  combination  of  model  analysis  and  data.  Since  the 
global  NCOM  does  not  include  tides,  these  are  explicitly  inserted  in  the  RNCOM  nests  through  the  boundary  conditions  and  local 
forcing  terms. 

The  data  assimilation  is  performed  using  the  Navy  Coupled  Ocean  Data  Assimilation  (NCODA)  system  [2].  It  is  based  on  a 
three-dimensional  multivariate  optimum-interpolation  (MVOl)  data  assimilation  system,  that  will  in  the  future  apply  variational 
methods  (3DVAR  and  4DVAR),  cycling  in  real-time  to  provide  new  analysis  and  model  updates  of  the  ocean  state  variables 
(temperature,  salinity,  velocity  and  sea  surface  height).  Additional  capabilities  built  into  the  system,  includ  flow-dependent 
background-error  correlations  and  background -error  variances  that  vary  in  space  and  evolve  from  one  analysis  cycle  to  the  next.  It 
also  includes  a  data  quality-control  system  with  multivariate  analysis  using  feedback  of  forecast  fields  and  prediction  errors  in  the 
quality  control  of  new  observations. 

Typical  runs  of  these  systems  use  72  or  96  hours  forecast  ranges,  horizontal  grids  of  3km  resolution  and  50  vertical  levels,  of 
which  the  upper  30  are  sigma  layers.  The  topography  is  usually  taken  from  the  Naval  Research  Laboratory  global  2  minute 
resolution  ocean  bathymetry  data  base  (NRL  DBDB2).  Atmospheric  forcing  usually  consists  of  hourly  fields  of  sea  level  air 
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pressure,  wind  stress,  solar  and  long  wave  radiation,  and  2m  air  temperature  and  humidity  from  1 5-km  resolution  Coupled  Ocean- 
Atmosphere  Modular  Prediction  System  (COAMPS)  analysis/forecast  runs  and  interpolated  to  the  ocean  model  grids.  Operational 
runs  start  daily  24  hours  prior  to  analysis  time  from  a  snapshot  of  the  previous  run.  The  first  24  hours  are  used  for  the  model 
initialization  by  sequentially  adding  the  increments  computed  by  NCODA  (slow  insertion  of  model  corrections),  such  that  at  the 
analysis  time  (hour  0)  the  model  fields  reproduce  the  best  analysis  estimates  as  delivered  by  NCODA.  Model  outputs  are  then 
post-processed  to  standard  levels  and  made  available  through  netCDF  files.  These  files  can  include  a  full  or  partial  range  of 
variables  from  the  native  model  output. 

The  NCODA  interpolation  problem  is  formulated  as: 

+  R)-‘[y  -  H(xJJ  (1) 

where  x,  is  the  analysis  vector,  Xb  is  the  time  dependent  background  vector,  Pb  is  a  time  dependent  background  error 
covariance  matrix,  H  is  the  observational  functional  operator,  R  is  the  observation  error  covariance  matrix,  and  y  is  the 
observation  vector  at  a  specific  update  cycle.  The  forward  model  H  converts  forecast  model  variable  to  an  observed  variable  and, 
as  used  here,  is  a  spatial  interpolation  of  the  forecast  model  grid  to  the  observation  location  performed  in  three  dimensions. 
Therefore,  HPbH^^  is  the  background -error  covariance  between  observation  locations,  and  PbH^  the  error  covariance  between 
observation  and  grid  locations.  The  quantity  {y  -  H(Xb)}  is  referred  to  as  the  innovation  vector  and  x,  -  Xb  is  the  increment  (or 
correction)  vector.  Observations  are  assimilated  close  to  their  measurement  times  within  the  update-cycle  time  window  by 
comparison  against  time-dependent  background  fields  using  the  first-guess  at  appropriate  time  (FGAT)  method.  The  ocean 
variables  are  analyzed  simultaneously  in  three  dimensions  such  that  the  observation  vector  contains  all  of  the  synoptic 
temperature,  salinity  and  velocity  observations  that  are  within  the  geographic  and  time  domains  of  the  forecast  model  grid  and 
update  cycle.  The  velocity  increments  are  forced  to  be  in  geostrophic  balance  with  the  geopotential  increments  which,  in  turn,  are 
in  hydrostatic  balance  with  the  temperature  and  salinity  increments.  Prior  to  an  analysis  the  innovation  vector  is  normalized  by 
the  background  error  at  the  observation  locations,  and  after  an  analysis  the  increment  vector  is  scaled  by  the  background  error  at 
the  grid  locations. 

Typical  implementations  of  NCODA  use  more  than  30  vertical  levels,  with  the  background  error  variances  being  computed 
from  the  increments  using  a  recursive  filter  model  with  a  time  constant  of  10  days  and  imposing  geostrophic  cross-correlations  on 
the  velocity  errors  computed  from  the  mass  variables.  The  first  baroclinic  Rossby  radius  of  deformation  is  usually  selected  as  the 
local  spatial  correlation  length  scale  for  horizontal  interpolations  between  observations  and  observations  and  grid  locations.  The 
vertical  correlation  length  scale  are  computed  from  local  background  density  vertical  gradients.  The  system  uses  a  First  Guess  at 
Appropriate  Time  (FGAT)  window  of  24  hours,  usually  set  as  12  hours  around  analysis  time.  When  no  data  is  available  for  long 
periods  of  time,  error  variances  are  relaxed  towards  climate  variability.  The  NCODA  system  also  includes  a  complete  quality 
control  system  that  assesses  the  error  probability  of  each  individual  observation  and  the  final  profile.  These  profile  data  are  then 
processed  with  the  model  runs  to  produce  data  match-up  files  that  are  used  to  run  diagnostics  of  the  model  forecast  errors  and 
ensemble  performances. 

For  this  experiment  the  RNCOM  was  run  on  the  domain  15N  1 13E,  I8.3N  121 E  (see  Fig.  1),  from  1  March  to  4  April  2009. 
The  boundary  conditions  were  taken  from  the  Global  run  of  NCOM  and  atmospheric  forcing  from  the  regional  15km  COAMPS 
[3]  run  by  the  Fleet  Numerical  Meteorological  and  Oceanographic 
Center  (FNMOC).  Tides  were  introduced  at  the  boundaries  and  through 
local  tidal  potentials.  The  horizontal  grid  spacing  was  set  at  3km  and 
used  50  sigma/z  levels  in  the  vertical.  Observations  from  global  data 
bases  were  assimilated  over  the  full  period,  including  satellite  altimetry 
and  SST.  From  17  March  to  the  end  of  the  simulation  period  there  was 
also  one  glider  in  the  area  providing  regular  profiles  and  ship 
observations  during  the  period  21-23  March. 

Local  dynamics  where  characterized  by  the  evolution  of  a  cold  core 
gyre  in  the  middle-northern  portion  of  the  domain,  typical  in  the  region 
for  this  time  of  the  year,  intruding  westward  in  depth  towards  the  coast. 

Further  south,  the  temperature  fields  where  more  homogeneous  and 
stable  throughout  the  simulations  period.  Overall  in  the  area  there  was  a 
sharp  well  defined  upper  layer,  with  a  steep  thermocline  ranging 
between  50  to  1 00m  depth.  The  tides  in  the  area  were  small  but  still  capable  of  some  perturbations  of  the  thermocline  within  the 
tidal  cycles,  suggesting  a  propagation  of  mild  internal  tide  interfacial  modes  in  the  region. 


The  system  responded  significantly  to  the  assimilation  of  local  observations  as  shown  in  Fig.2.  The  upper  bar  plot  in  this  figure 
shows  water  temperature  mean  errors  (bias)  for  19-23  March  averaged  through  out  the  domain  in  the  upper  400m.  The  middle 
plot  shows  the  value  of  the  maximum  temperature  bias  error  observed  in  the  upper  400m  and  the  lower  plot  shows  the  average  of 

the  profile  RMS  errors.  The  larger  errors  were  observed  at  the 
thermocline  levels  (50-1 00m  depths)  and  became  smaller  after 
21  March  after  the  assimilation  of  the  ship  observations.  In  these 
bar  plots  the  green  bars  correspond  to  the  comparisons  between 
observations  and  24-48  hour  forecasts  (prior  to  assimilating  the 
test  data)  and  use  the  same  observations  as  the  blue  bars  that 
compare  the  data  with  the  consecutive  run  of  the  model  using  the 
0-24  hours  forecasts  after  assimilating  the  independent  test  data. 
By  comparing  these  two  bars  we  can  get  a  proxy  of  the  impact  of 
the  observations  that  were  assimilated  between  the  two 
consecutive  runs. 

If  we  define  A  =  Ci  (before  ass.)  -  ei+,  (after  ass.>.  where  e, 
corresponds  to  the  forecast  mismatch  RMS  before  the 
assimilation  of  test  data  and  ej+i  to  the  forecast  mismatch  after 
the  assimilation  of  the  data  (corresponding  to  the  green  and  blue 
bars  displayed  in  the  lower  plot  of  Fig.2),  we  can  expect  the 
daily  mean  A  to  be  positive  during  the  period  shown  in  the  picture 
(i.e.  observations  had  a  positive  impact  in  reducing  forecast 
errors).  The  overall  mean  A  was  positive  (with  a  0.04®C  average) 
and  had  an  RMS  of  0.1 5®C.  This  shows  the  system  had  a  small 
but  positive  response  to  the  assimilation  of  local  observations  that  should  be  reproduced  by  the  ensemble  based  error  models  (e.g. 
[4],  [5]),  as  it  will  be  expanded  in  the  next  sections. 

11.  ERROR  MODELING  USING  THE  ENSEMBLE  TRANSFORM 

The  errors  in  the  RNCOM  variable  forecasts  are  determined  by  multiple  sources  of  uncertainty  as  detailed  in  [4].  They  are 
associated  with  the  model  initialization  and  boundary  conditions,  numerical  approximations,  modeling  strategies,  impact  of 
under-sampling  in  the  assimilation  process  and  unresolved  scales.  To  address  the  initialization  error  this  work  uses  the  ensemble 
transform  (ET)  method  to  produce  perturbations  at  each  analysis  time  following  the  approach  detailed  in  [5]  and  [6].  The  ET  uses 
the  best  available  estimate  of  analysis  error  covariance  to  transform  forecast  perturbations  into  analysis  perturbations  by  finding  K 
distinct  linear  combinations  of  K  forecast  perturbations.  For  operational  implementations,  the  NCODA  analysis  error  variance  is 
used  to  renormalize  each  new  ET  ensemble.  The  ET  analysis  perturbations  are  then  added  to  the  best  available  analysis  (in  this 
case  produced  by  the  RNCOM-NCODA)  to  create  K  initial  states.  These  K  initial  states  are  then  integrated  forward  in  time  using 
the  non-linear  model  to  create  the  next  ensemble  forecast  that  will  be  the  starting  point  for  the  ET  transformation  that  will 
generate  the  initial  conditions  for  the  subsequent  ensemble  forecast  once  the  subsequent  analysis  is  available.  As  such,  the  ET 
ensemble  generation  scheme  is  a  cycling  scheme  with  strong  similarities  to  a  breeding  scheme.  This  technique  does  not  account 
for  additional  error  sources  that  could  develop  during  the  forecast  period  through  the  boundaries  and/or  through  unrepresented 
numerical  errors  and  approximations. 

The  RNCOM  ensembles  also  represent  the  errors  due  to  uncertain  atmospheric  forcing  by  using  an  ensemble  of  atmospheric 
perturbations  and  allow  each  ocean  run  to  have  an  independent  atmospheric  forcing,  therefore  augmenting  the  domain  spanned  by 
the  the  ensemble  of  initial  states.  This  is  done  using  atmospheric  forcing  perturbations  as  detailed  in  [7],  based  on  spatially- 
varying  time  shifts  of  the  atmospheric  forecast,  with  a  choice  of  parameters  to  provide  a  well  developed  spread  of  atmospheric 
perturbations.  This  method  is  mostly  adequate  when  predicted  atmospheric  fields  contain  the  forecast  features  of  interest,  but  they 
are  misplaced  in  space  and  time. 

The  ensembles  resulting  from  combining  the  ET  and  the  perturbed  atmospheric  forcing  are  then  used  to  predict  how 
uncertainties  of  the  ocean  fields  will  evolve  in  space  and  time.  Typical  implementations  of  ensembles  use  40  independent 
members  perturbed  from  a  control  run  that  performs  the  full  data  assimilation  cycle,  where  each  one  of  the  simulations  uses 
independent  atmospheric  forcing  and  perturbed  initial  conditions  as  detailed  above. 

For  this  experiment  off  the  East  Philippines,  the  system  was  implemented  using  32  ensemble  members  centered  around  a 
control  run  assimilating  the  local  data.  The  ensemble  spread  of  these  runs  was  reset  at  each  running  cycle  using  the  NCODA 
analysis  error  variances.  Fig.  3  shows  an  example  of  the  temperature  fields  at  50m  depth.  The  map  on  the  left  corresponds  to  the 
temperature  estimates  at  00:00  of  23  March  and  the  map  on  the  right  shows  the  corresponding  ensemble  spread  (i.e.  the  standard 
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Fig.  2  -  Comparisons  of  upper  400m  mean 
temperature  forecast  errors  before  assimilating  test 
data  (48  hours  forecasts  -  green  bars  )  with  the 
errors  after  assimilating  test  data  (24  hours  foreeasts 
-  blue  bars). 


deviation  of  the  ensemble  member  temperatures  around  the  ensemble  mean).  Note  regions  of  larger  predicted  errors  concur  with 
the  areas  of  larger  gradients  and  time  variability.  The  layer  of  50m  corresponds  roughly  to  the  upper  level  of  the  thermocline  and 

was  the  most  dynamic  and  uncertain  through  out  the 
experiment.  Since  we  have  an  ensemble  population 
larger  than  30,  using  the  central  limit  theorem  one  can 
assume  there  will  be  a  normally  distributed  envelope 
variable  that  can  be  estimated  by  the  ensemble  (i.e.  the 
ensemble  members  are  normally  distributed).  Based 
on  the  ensemble  spread  as  shown  in  the  figure  for  a 
normal  distribution  and  using  the  number  of  samples 
equal  to  32,  we  expect  the  ensemble  mean 
temperatures  errors  to  be  smaller  than  O.T^’Cg  up  to 


95%  when  the  ensemble  spread  is  2°C  and  0.3®C  when 
the  ensemble  spread  is  I^C.  This  analysis  can  provide 
an  immediate  application  of  the  ensemble  as  a  proxy 
error  estimate.  Fig.  4  shows  how  the  predicted  ensemble  spread  compared  with  the  magnitudes  of  the  0-24  hour  forecast 
mismatches  (|ej|).  The  upper  scatter  plots  show  the  temperature  and  salinity  ensemble  spread  vs  the  observed  |e,|  for  days  17 
March  (before  the  ship  data  being  assimilated)  and  for  23  March  (during  ship  observations).  The  maps  below  each  scatter 
diagram  show  the  mean  ensemble  spread  at  the  analysis  time  and  the  white  crosses  show  the  locations  were  data  was  available 
and  used  to  compute  the  model  mismatches  ej. 


Fig.  3  -  RNCOM  and  Ensemble  estimates  of  water  temperature  at 
50m  depth  for  00;00  March,  23  2009 
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Fig.  4  -  The  upper  scatter  plots  show  the  TEMPERATURE  and  SALINITY  ensemble  spread  vs  the  observed  forecast 
errors  (e)  for  days  March  17  on  the  left  (before  the  ship  data  being  assimilated)  and  for  March  23  on  the  right  (last 
day  of  ship  observations).  The  maps  below  each  scatter  diagram  show  the  mean  ensemble  spread  at  the  analysis  time 
and  the  white  crosses  show  the  locations  were  data  was  available  and  used  to  compute  the  model  mismatches.  The 
color  of  each  individual  estimate  displayed  as  the  small  dots  corresponds  to  the  depth  of  the  observation,  accordingly 
to  the  color  code  displayed  in  the  bars  on  the  right  (in  meters). 

Since  ensemble  variance  is  designed  to  be  a  predictor  of  the  true  variance  it  is  not  comparable  to  single  measurements  of 
squared  model-data  mismatches.  To  account  for  this  the  observed  squared  model  mismatches  were  used  to  estimate  observed 
error  variances  by  ordering  the  data  pairs  from  smallest  ensemble  variance  to  largest  ensemble  variance  and  then  grouping  the 
data  pairs  into  10  approximately  equally  populated  bins.  Within  each  bin,  the  bin  averaged  squared  model-data  mismatches  were 
computed  together  with  the  bin-averaged  ensemble  variance.  Since  both  of  these  quantities  are  sample  variances,  they  are 
comparable.  In  order  to  keep  the  state-variable  units  this  binning  and  averaging  was  done  using  the  magnitudes  of  the  forecast 
mismatches  and  the  ensemble  standard  deviation  rather  than  the  variances.  These  quantities  are  displayed  in  Fig.4  as  the  large  red 
dots  in  the  scatter  diagrams.  For  the  ensemble  to  be  accurate,  these  large  red  dots  should  be  aligned  along  a  positive  slope  and 
ideally  along  the  main  diagonal,  highlighted  as  a  black  line  on  the  plots.  The  skill  is  represented  by  the  metric  “SprSkil” 
corresponding  to  the  bin  correlation  changing  between  0  and  1 .  A  value  of  SprSkil=l  represents  a  perfect  spread-skill  relation  and 
values  above  0.5  can  be  considered  good  ensemble  spread-skill  (i.e.  the  ensembles  have  the  ability  to  correctly  separate  the  small 


ensemble  spread  values  correlated  with  the  smaller  observed  errors,  from  the  larger  ensemble  spread  values  correlated  with  the 
larger  observed  forecast  errors).  Another  relevant  metric  is  the  mean  ratio  between  measured  magnitudes  of  model-data 
mismatches  and  ensemble  spread,  (Err/Std  in  the  plots),  which  represents  the  ensemble  skill  in  terms  of  correctly  predicting  the 
magnitudes  of  the  errors.  Values  higher  than  1  indicate  that  the  ensemble  is  under-dispersive  (under-predicts  the  error  magnitude 
and  error  bins  are  above  the  diagonal  line)  and  values  below  1  indicate  that  the  ensemble  is  over-dispersive  (over  predicts  forecast 
error  magnitudes  such  that  error  bins  are  below  the  main  diagonal).  We  can  see  that  for  temperature  the  ensemble  was  over- 
dispersive  and  under-dispersive  for  salinity. 

The  assimilation  of  ship  observations  in  days  21-22  March  appears  to  have  reduced  the  ensemble  spread  in  temperature 
(smaller  predicted  errors)  and  the  ratio  Err/Std  became  closer  to  1.  For  salinity  the  ensemble  spread  increased  along  the  sharp 
fronts  and  became  slightly  more  under-dispersive.  The  larger  differences  between  the  ensemble  estimates  and  the  observed 
model-data  mismatches  were  seen  between  the  50m  and  150m  isobaths,  near  the  mixed  layer  depth. 


Ill.  Targeting  Observations  to  Reduce  Uncertainties 


The  problem  of  identifying  the  best  location  for  deploying  mobile  observation  platforms  is  often  called  the  adaptive  sampling 
or  targeting  observation  problem.  The  importance  of  this  topic  has  been  heightened  in  oceanic  applications  by  the  advent  of 
Autonomous  Underwater  Vehicles  (AUVs).  Planning  the  missions  of  these  platforms  includes  updating  reference  way-points  on 
regular  schedules  such  that  one  must  solve  the  adaptive  sampling  problem  before  some  critical  decision  time.  This  work  uses  the 
Ensemble  Transform  Kalman  Filter  (ETKF)  technique  proposed  by  [8]  and  applied  by  [9]  to  adaptive  sampling  in  atmospheric 
modeling  applications.  The  ETKF  uses  the  error  estimates  produced  by  the  ensemble  forecast  as  detailed  in  the  previous  section, 
and  rapid  low  rank  solutions  of  the  Kalman  filter  equations  to  solve  the  targeting  observation  problem.  The  first  step  of  the 
method  is  to  identify  the  areas  of  interest  inside  the  simulation  domain  here  named  as  the  target  box  and  a  forecast  time  called  a 
verification  or  target  time  at  which  one  wishes  the  adaptive  supplemental  observations  taken  at  an  earlier  observation  time  to 
produce  a  maximum  effect  defined  by  a  cost  function.  For  the  present  example  the  cost  functions  to  be  minimized  were  the 
ensemble  forecast  variance  (taken  as  a  proxy  of  the  forecast  error  magnitude)  for  ocean  temperature  profiles. 


This  technique  assumes  the  analysis  error  covariance  at  the  observation  time  can  be  estimated  by  _  j  where  A^(NjcK)  is 

the  matrix  with  the  ensemble  N  state-variables  forecasts  at  the  observation  time  and  K  is  the  number  of  ensemble  members. 
However,  these  perturbations  should  be  consistent  with  the  best  available  guess  of  the  error  variance  of  an  analysis  made  using  all 
of  the  observations  apart  from  the  observations  that  will  be  targeted.  To  account  for  this  constraint  and  as  described  in  [9]  a 
Transformation  matrix  T*^  is  applied  so  that,  X'^=X^T^  The  transformation  matrix  T'^  is  computed  using  the  ET  technique  and  a 
guess  of  the  analysis  error  covariance  associated  with  the  routine  observations  based  on  the  ensemble  predicted  errors  at  those 
locations.  The  transformed  perturbations  X*^  are  then  used  produce  a  new  error  covariance  P'g.  For  this  present  study  this 
transformation  matrix  was  taken  as  an  identity  matrix,  hence  considering  only  the  impact  of  observation  platforms  under  the 
assumption  that  there  were  no  routine  data.  The  posterior  analysis  error  covariance  P^etur  after  assimilating  the  ith  feasible 
deployment  of  adaptive  observations  will  then  be  given  by: 
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where  Hq  describes  the  mapping  from  the  model  state  vector  to  the  observation  vector  normalized  by  the  inverse  square  root  of 
the  observation  error  covariance  associated  with  the  /th  feasible  deployment. 


Using  this  result,  we  can  now  estimate  where  Xf  is  the  ensemble  state  matrix  after  including  the  targeted 

observations.  The  columns  of  can  be  interpreted  as  transformed  ensemble  perturbations  such  that  their  covariance  gives  the 
analysis  error  covariance  at  the  observation  time  assuming  that  the  /th  deployment  of  targeted  observations  had  been  assimilated 
i.e.:X;=X^T„where  Tetkf  is  now  determined  by  the  eigenvectors  and  eigenvalues  of  the  projections  of  the  magnitude  of 
the  analysis  perturbations,  corresponding  to  the  possible  observations  strategies,  into  .  As  shown  in  [8],  this  transform  will 
not  change  along  the  forecast  cycle;  therefore  the  same  transform  matrix  can  be  applied  to  the  forecast  perturbations  to  estimate 
the  error  covariance  at  the  verification  time  P/ .  These  results  are  hereafter  used  to  infer  the  impact  of  observations  taken  at  a 
given  observations  time  on  the  error  variance  (computed  as  the  standard  deviation  of  )  of  the  selected  variables  at  a  later  time 
and  at  specific  target  locations  H\.  Therefore,  the  RMS(H\  -  H\  X°f )  should  be  a  consistent  estimator  of  the  RMS(Aob»er>ed)- 
To  find  which  of  all  feasible  deployments  of  targeted  observations  has  a  larger  impact  in  minimizing  the  selected  cost  function 
(e.g.  the  ensemble  spread  of  selected  variables),  one  can  estimate  the  reduction  in  ensemble  spread  for  each  feasible  grid  point  of 
the  ensemble  domain,  taken  as  a  single  profile  measurement,  through  a  range  of  selected  depths  (for  the  present  example  0  to 
1000m  to  simulate  a  glider  profile  observation)  and  display  the  resulting  information  in  the  so-named  summary  maps  as  shown 


here  in  Fig.5.  Either  by  inspection  or  using  automated  tools  to  follow  the  areas  of  higher  gains  in  both  time  and  space  (e.g.  [9]  and 
[10]),  one  can  choose  from  the  possible  deployments  those  that  will  produce  the  most  desired  effect.  To  test  these  maps,  target 
locations  were  selected  along  the  subsequent  day  observation  sites  (red  dots  in  the  map)  such  that  we  can  compare  the  changes  in 

ensemble  spread  to  the  changes  in  model  data  mismatches.  The 
summary  map  displayed  in  Figure  5  refers  to  the  23  March  run, 
summarizing  the  suggested  observation  guidance  from  00:00  24  March 
to  00:00  25  March,  in  reducing  the  mean  temperature  profile  error  over 
the  target  box  at  a  forecast  range  of  48  hours  after  the  analysis  (i.e.  00:00 
25  March).  As  anticipated,  the  area  of  higher  impact  in  reduction  of 
temperature  error  overlaps  with  the  larger  ensemble  spread  and  fronts  as 
displayed  in  Fig. 3,  that  were  closer  to  the  target  points,  suggesting  that 
an  accurate  characterization  of  the  frontal  system  would  be  the  one  to 
reduce  most  the  errors  at  the  testing  locations.  The  white  dots  show 
where  observations  actually  took  place  during  the  observations  window 
00:00  24  March  24  to  00:00  25  March. 

To  positively  validate  the  summary  maps  we  require  that  the  absolute 
value  of  the  ETKF  estimates  of  the  model  corrections  over  the  target 
locations,  after  assimilating  the  available  observations,  be  consistent 
with  the  actual  forecast  error  reduction.  One  way  to  demonstrate  this 
property  is  to  take  the  actual  measured  model  innovations  and  use  the 
ETKF  to  predict  their  impact  at  the  target  or  verification  time,  and  then 
use  other  independent  data  to  verify  that  the  model  forecast  error 
variances  were  reduced  accordingly,  as  discussed  below.  For  this 
property  to  happen  it  will  be  necessary  that  the  absolute  value  of  the 
differences  between  the  control  runs  at  the  verification  time,  before  and 
after  assimilating  the  observations  to  be  consistent  (i.e.  well  correlated) 
with  the  ETKF  estimates  at  the  target  time  for  model  corrections, 
olution  of  the  ocean-state  due  to  any  possible  changes  in  the  atmospheric 
forcing  and  boundary  condition.  If  they  are  well  correlated  then  one  can  expect  the  suggestions  provided  by  the  summary  maps 
should  be  also  accurate,  since  the  ETKF  is  consistently  predicting  the  changes  in  forecast  error.  For  this  purpose  Fig.  6  shows  the 
changes  in  the  forecast  fields  before  and  after  the  assimilation  of  the  data  that  compare  well  with  the  error  corrections  the  ETKF 
predicted  based  solely  on  the  positions  where  observations  took  place,  for  23  March.  The  map  on  the  left  shows  the  difference 
between  the  48hours  forecast  of  the  run  from  23  March  with  the  00:00  snapshot  of  the  run  from  25  March.  The  main  cause  for 
differences  between  these  two  snapshot  (allowing  for  possible  updates  in  the  atmospheric  forcing  and  some  small  changes  in  the 
boundary  conditions)  was  the  assimilation  of 
the  data  from  the  locations  highlighted  as  the 
white  dots.  The  vertically  averaged  predicted 
temperature  corrections  (the  map  on  the  right) 
show  patterns  comparable  to  the  vertically 
averaged  magnitude  of  the  observed  changes 
between  the  two  consecutive  forecasts 
(before  and  after  assimilating  the  data).  Areas 
where  the  ETKF  analysis  predicts  larger 
corrections  are  concurrent  with  the  dynamic 
features  associated  with  the  frontal  systems 
one  would  expect  to  be  dynamically 
correlated  with  the  observation  sites.  On  the 
other  hand,  the  observed  corrections  were 
more  localized,  as  one  could  expect  from  the 
MVOl  based  on  pre-set  correlation  length 
scales.  The  small  area  with  large  observed  corrections  close  to  the  north  boundary  and  close  to  16N  1 14E  seems  to  have  been 
missed  by  the  ETKF,  due  either  to  data  other  than  profile  assimilation,  or  to  a  small  ensemble  spread  prediction  at  the 
observations  time.  The  good  qualitative  agreement  between  these  two  maps  suggests  the  ETKF  was  correctly  predicting  the 
impact  of  the  corrections  computed  by  the  NCODA  system  and  inserted  into  the  RNCOM  runs.  One  should  note  that  this 
agreement  exists  in  spite  of  the  fact  that  the  ETKF  was  applied  prior  to  the  actual  observation  time,  used  a  fraction  of  the 
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Fig.  6  -  Vertically  averaged  predicted  temperature  corrections  map  (on 
the  right)  and  the  equivalent  vertically  averaged  magnitude  of  the 
observed  changes  between  two  consecutive  forecasts,  before  and  after 
assimilating  the  data  (map  on  the  left). 


SUMMARY  MAR  FOR  DAY  23 


Fig.  5  -  Summary  Maps  for  T  profile  error 
reduction  showing  the  relative  impact  of  T-S 
independent  observations  during  the  window 
tau=0-24  in  reducing  the  T  errors  at  the  sites 
highlighted  as  red  dots,  corresponding  to  where 
observations  were  made  during  tau=24-48. 

The  white  crosses  show  the  places  where 
observations  were  actually  made  and 
assimilated  into  the  next  run  during  the 
window  tau=12-36 


through  the  available  observations,  up  to  the  dynamic  t 


ensemble  spread  as  an  estimate  of  the  local  observed  innovation  and  ensemble>based  covariances  to  estimate  the  overall 
corrections.  The  NCODA  system  uses  a  different  method  to  compute  the  correction  based  on  the  measured  differences  between 
the  observations  and  the  control  run  at  the  observation  location  and  the  nearest  time,  and  a  multi*variate  optimal  interpolation 
method  (e.g.  [2]),  with  pre-defined  correlation  length  scales  (~50km  in  the  present  case).  Between  the  two  runs  for  23  and  25 
March,  different  atmospheric  perturbations  and  boundary  conditions  were  another  source  contributing  to  the  observed  change  in 
the  control  runs  that  were  not  considered  by  the  ETKF.  This  limitation  might  be  overcome  in  future  systems  using  fully  coupled 
model  dynamics  and  coupled  data  assimilation,  such  that  ocean  observations  would  reduce  the  uncertainty  of  the  surface  forcing 
and,  in  principle,  account  for  the  uncertainty  in  surface  forcing  in  a  rigorous  way. 

A  more  quantitative  comparison  between  the  ETKF  predictions  and  observations  of  A  can  be  found  in  Fig.7  as  mentioned 
above.  The  mean  magnitudes  of  the  changes  in  model  forecast  mismatches  are  shown  as  the  average  values  of  the  large  red  dot 
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Figure  7  -  Scatter  diagrams  showing  the  observed  model  mismatch  changes  before  and  after  assimilating  data  (“A”) 
vs.  the  square  root  of  the  ETKF  predicted  changes  in  model  error  variance.  Colors  correspond  to  the  depth  in  meter 
were  comparisons  were  made  according  to  the  eolorbar  in  the  right.  The  diagram  in  the  left  shows  the  magnitude  of 
“A”  and  the  plot  on  the  right  show  the  signed  “A”  such  that  positive  values  correspond  to  decreasing  model-data 
mismatches  and  negative  values  to  increasing  mismatches  between  two  consecutive  runs  and  relative  to  the  same  test 
observations. 

pixels  of  the  left  plot  (note  that  the  same  binning  is  here  required  to  compare  ensemble  variances  with  local  observed  model 
mismatches  as  explained  above  for  Fig.4).  This  corresponds  to  a  necessary  condition  for  forecast  error  reduction  and  is  the 
quantity  that  is  directly  estimated  by  the  ETKF  and  used  for  adaptive  sampling  guidance,  l.e.  the  ETKF  will  look  for  the 
observations  that  will  produce  larger  changes  in  model  error  variance.  The  good  overall  correlation  between  this  observed 
quantity  and  the  ETKF  predictions  (see  details  below)  measured  by  the  parameter  SprSkil  =  0,98  and  the  good  mean  ratio 
between  the  observed  vs  predicted  values  (Err/Std  =  0.58),  shows  the  ETKF  was  accurately  tracking  these  changes  in  model  error 
variance.  However,  this  is  not  a  sufficient  condition  for  model  error  reduction.  Model  error  reduction  will  occur  only  when 
observations  are  taken  at  the  right  places  and  at  the  right  times  and  is  independent  of  the  ETKF  being  able  to  correctly  track 
model  error  variance.  The  actual  forecast  error  reduction/increase  can  be  estimated  by  the  average  of  the  pixels  of  the  plot  on  the 
right  (signed  A),  showing  the  mean  change  in  model  forecast  error  between  consecutive  forecasts  such  that  if  positive  the  errors 
were  getting  smaller  and  if  negative  the  errors  were  getting  larger.  As  the  observations  become  more  efficient  in  reducing  the 
errors  for  the  next  run  these  two  plots  should  converge.  This  feature  can  be  well  seen  by  using  smaller  sub-sets  of  observations, 
selected  from  the  sites  estimated  to  produce  the  larger  impacts  in  error  reduction.  The  plot  in  the  right  shows  a  poorer  correlation 
(SprSkil  =  0.61)  and  a  mean  ratio  of  Err/Std  =  0.13,  and  suggests  the  observations  during  the  period  17  March  to  2  April  were  not 
enough  to  persistently  produce  an  overall  noticeable  impact  in  reducing  the  model  forecast  errors  as  discussed  in  section  1.  The 
average  value  for  the  RMS(A)  was  0.15‘^C  and  MEAN(A  )  was  +0.04®C.  The  corresponding  mean  square  root  of  the  model  error 
variance  change  as  predicted  by  the  ETKF  was  0.28Deg.  This  suggests  the  ETKF  was  over  predicting  the  magnitude  of  the  actual 
changes  in  error  variance  and  that  only  roughly  30%  of  the  actual  changes  in  error  variance  were  actually  producing  a  forecast 
error  reduction. 

In  order  to  see  how  these  changes  occur  and  if  the  ETKF  was  accurately  separating  the  larger  error  variance  changes  from  the 
smaller  error  variance  changes  one  can  note  that  the  large  red  dots  from  both  plots  diagrams  were  correlated  with  the  predicted 
error  variances.  Ideally  these  large  red  dots  should  be  aligned  along  the  diagonal  line,  but  if  aligned  along  a  positive  slope  line 


they  will  already  show  the  ETKF  was  able  to  distinguish  the  large  error  changes  from  the  smaller  error  changes  and  consequently 
was  able  to  provide  accurate  adaptive  sampling  guidance.  These  slopes  are  estimated  by  the  bin  correlations  and  show  the  Spread- 
Skill  relation  (SprSkil  in  the  plots).  This  parameter  is  positive  for  both  plots  showing  that  the  ETKF  had  good  skill  in  separating 
the  errors  and  that  there  was  on  average  a  positive  correlation  between  the  error  reductions  at  the  observation  sites  with  the  error 
reduction  at  the  validation  sites,  where  the  model  mismatches  were  computed,  confirming  the  guidance  provided  by  the  ETKF  to 
be  accurate. 


IV.  Final  Remarks 

During  the  spring  2009  a  high- re  solution  NCOM  (RNCOM)  system  was  used  to  produce  mean  and  error  fields  of  meso-scale 
ocean  states,  northeast  of  the  Philippines.  These  fields  were  used  to  test  novel  techniques  to  guide  observation  assets  in  reducing 
predicted  model  errors.  The  dynamics  of  the  area  included  a  sharp  frontal  system  in  the  north,  intruding  colder  water  southward 
and  a  mild  stable  regime  in  the  southern  portion  of  the  simulation  domain.  Atmospheric  forcing  was  stable  throughout  the  test 
period  and  a  mild  tidal  regime  was  observed  and  predicted  in  the  area.  After  a  spin-up  period,  the  daily  mean  residual  prediction 
errors  in  temperature  profiles  were  typically  below  0.75°C,  with  larger  values  in  the  upper  and  lower  limits  of  the  thermocline 
(--50- 100m  depth),  where  stratification  was  stronger. 

An  ensemble  of  RNCOM  runs  was  used  to  predict  the  dynamics  of  the  forecast  errors  of  the  control  simulations.  These 
ensemble  runs  used  perturbed  initial  conditions  and  perturbed  atmospheric  forcing  as  the  main  sources  of  uncertainty.  At  each 
analysis  cycle,  the  initial  condition  perturbations  were  re-balanced  using  the  ET  technique  that  guarantees  the  ensemble  error 
covariance  at  each  analysis  time  to  match  closely  the  best  estimate  of  the  analyses  error  variance,  as  provided  by  the  NCODA 
system  after  the  objective  analysis  of  the  available  observations.  The  forecast  error  predicted  from  the  ensemble  runs  proved  to  be 
consistent  with  the  observed  forecast  errors  provided  up  to  the  data  representation  error,  with  a  good  spread-skill  relation 
displaying  a  bin  correlation  always  above  0.85. 

The  error  forecasts  produced  by  the  ensemble  system  were  then  used  to  produce  maps  summarizing  the  predicted  impact  of 
each  grid  point  taken  as  a  possible  observation  site  in  reducing  the  errors  of  temperature  profiles  over  selected  sites.  The  predicted 
relative  impact  of  the  available  observations  assimilated  into  the  model  runs  agreed  well  with  the  observed  dynamics  in  the  area 
and  on  average  with  model -data  mismatches  changes. 

Overall,  the  work  presented  above  showed  that  some  level  of  predictability  of  stochastic  environmental  variables  through 
numerical  modeling  can  be  achieved  by  Monte-Carlo  methods,  producing  ensemble-based  forecast  error  estimates  along  with  the 
predicted  state  variables,  using  a  limited  number  of  ensemble  runs.  The  focus  of  this  effort  was  to  use  these  estimates  in  target 
observation  problem  and  to  demonstrate  the  feasibility  and  anticipated  accuracy  of  the  ETKF  system  within  the  operational 
systems.  It  is  anticipated  that  more  validation  tests  will  be  required  to  complement  this  work  using  other  independent  data  sets  for 
validation,  larger  ensemble  populations  and  different  regions  and  dynamics.  These  extended  analyses  will  allow  further  testing  of 
the  summar}'  maps  for  operationally  relevant  variables  and  scenarios. 
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